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Abstract — Recht, Fazel, and Parrilo provided an analogy 
between rank minimization and ^ -norm minimization. Subject 
to the rank-restricted isometry property, nuclear norm mini- 
mization is a guaranteed algorithm for rank minimization. The 
resulting semidefinite formulation is a convex problem but in 
practice the algorithms for it do not scale well to large instances. 
Instead, we explore missing terms in the analogy and propose 
a new algorithm which is computationally efficient and also has 
a performance guarantee. The algorithm is based on the atomic 
decomposition of the matrix variable and extends the idea in 
the CoSaMP algorithm for ^o-norm minimization. Combined 
with the recent fast low rank approximation of matrices based 
on randomization, the proposed algorithm can efficiently handle 
large scale rank minimization problems. 

I. Introduction 

Recent studies in compressed sensing have shown that 
a sparsity prior in the representation of the unknowns can 
guarantee unique and stable solutions to underdetermined 
linear systems. The idea has been generalized to linear rank 
minimization by Recht, Fazel, and Parrilo [1]. Rank minimiza- 
tion has important applications such as matrix completion, 
linear system identification, Euclidean embedding, and image 
compression. 

The rank minimization problem is formally written as: 

rank(X) 
6, 



PI: 



mm 

subject to AX 



for a given linear operator A : C mxn — > C p and b € C p . 

Fazel, Hindi, and Boyd [2] proposed a convex relaxation of 
the rank minimization problem. They minimized the nuclear 
norm which is the sum of all singular values of matrix 

X, and is the convex envelop of the non-convex function 
rank(X). Rank minimization is related to €o- norm minimiza- 
tion, which has been the focus of compressed sensing. Recht, 
Fazel, and Parrilo [1] provided an analogy between the two 
problems and their respective solutions by convex relaxation. 

In the analogy, £i-norm minimization for the £rr norm min- 
imization problem is replaced by nuclear norm minimization 
for PI. Both are efficient algorithms, with guaranteed perfor- 
mance under certain conditions, to solve NP-hard problems: 
£o-norm minimization and rank minimization, respectively. 
The respective conditions are given by the restricted isometry 
property and its generalization. However, whereas £i-norm 
minimization corresponds to a linear program, nuclear norm 
minimization is formulated as a convex semidefinite program 



(SDP). Although there exist polynomial time algorithms to 
solve SDP, in practice they do not scale well to large problems. 

Recently, several authors proposed methods for solving 
large scale SDP derived from rank minimization. These in- 
clude interior point methods for SDP, projected subgradient 
methods, and low -rank parametrization [1] and a customized 
interior point method [3] These methods can solve larger 
rank minimization problems, which the general purpose SDP 
solvers cannot. However, the dimension of the problem is 
still restricted and some of these methods do not guarantee 
convergence to the global minimum. Other methods solve 
nuclear norm minimization in a penalized form using singular 
value thresholding (SVT) [4] or fixed point and Bregman 
iterations [5]. It has been shown that the sequence of solutions 
converges to the solution to nuclear norm minimization as the 
penalty parameter increases. However, an analysis of the con- 
vergence rate is missing and hence the quality of the solution 
obtained by these methods is not guaranteed. Furthermore, 
the efficiency of these methods is restricted to the case of 
an affine (i.e., linear equality) constraint. Meka et. al. [6] 
used multiplicative updates and online convex programming 
to provide an approximate solution to rank minimization. 
However, their result depends on the (unverified) existence of 
an oracle that provides the solution to the rank minimization 
problem with a single linear constraint in constant time. 

For iVnorm minimization, besides ^i-norm minimization, 
there are recent algorithms, which are more efficient and 
also have performance guarantees. These include Compressive 
Sampling Matching Pursuit (CoSaMP) [7] and Subspace Pur- 
suit (SP) [8]. To date, no such algorithms have been available 
for rank minimization. 

In this paper, we propose an iterative algorithm to solve 
the rank minimization problem, which is a generalization Q 
of the CoSaMP algorithm for £o- norm minimization to the 
rank minimization problem. We call this algorithm "Atomic 
Decomposition for Minimum Rank Approximation," abbre- 
viated as ADMiRA. In CoSaMP, the £o-norm minimization 
problem with equality constraints is recast into an ,s-term 
vector approximation problem. Similarly, in ADMiRA we 

1 There is another generalization of CoSaMP, namely model-based CoSaMP 
[9]. However, this generalization addresses a completely different and unre- 
lated problem: sparse vector approximation subject to a special (e.g., tree) 
structure. Furthermore, the extensions of CoSaMP to model-based CoSaMP 
and to ADMiRA are independent: neither one follows from the other, and 
neither one is a special case of the other. 



recast PI into the rank-r matrix approximation problem 

min \\AX - b\L 
P2; xgc-x™ 

subject to rank(X) < r. 

The performance guarantee of ADMiRA states that the ap- 
proximate solution to P2 obtained by ADMiRA coincides with 
the true solution X to PI for any r ^ rank(AT) that satisfies 
the ADMiRA assumptions. 

II. Vector vs Matrix 

A. Preliminaries 

Throughout this paper, we use two vector spaces: the space 
of column vectors C p and the space of matrices C mx ™. For 
C p , the inner product is defined by (x, y)cp = y H x for x, y G 
C p where y H denotes the Hermitian transpose of y, and the 
induced Hilbert-Schmidt norm is the Euclidean or £2 -norm 
given by \\xWl = (a;, x)cp for x G C p . 

For C mx ", the inner product is defined by (X, Y) C mx n — 
tr(Y H X) for X, Y G C mxn , and the induced norm is the 
Frobenius norm given by = (X 7 X)c^x^ for X G 

X n 

B. Atomic Decomposition 

Let S denote the set of all nonzero rank-one matrices in 
C mX ". We can refine S so that any two distinct elements are 
not collinear. The resulting subset O is referred to as the set of 
atomsQ of C mx ™. Then the set of atomic spaces A of C mXTl 
is defined by 

A = {span(» : V G ©}. (1) 

Each subspace V G A is one-dimensional and hence is 
irreducible in the sense that V = V1+V2 for some V± , V% G A 
implies V\ = V2 = V. Since O is a uncountably infinite set 
in a finite dimensional space C mxn , the elements in O are 
not linearly independent. Regardless of the choice of O, A 
is uniquely determined. Without loss of generality, we fix O 
such that all elements have the unit Frobenius norm. 

Given a matrix X G C mxn , its representation X = 
2~2j a j' l Pj as a li near combination of atoms is referred to as an 
atomic decomposition of X. Since O spans C mxn , an atomic 
decomposition of X exists for all X G C Tnx ". A subset 
$ = {ip G O : (ipj,tpk)c mxn = djk} of unit-norm and 
pairwise orthogonal atoms in O will be called an orthonormal 
set of atoms. 

Definition 2.1: Let O be a set of atoms of C mx ™. Given 
X G C mxn , we define atoms(X) 

atomspf ) = argmin{|*| : tcO, X G span(*)} . (2) 

Note that atoms(X) is not unique. 

An orthonormal set atoms (X) is given by the singular value 
decomposition of X. Let X = Y^k±\~ <JkUkvj? denote 
the singular value decomposition of X with singular values 

2 The "atom" in this paper is different from Mallat and Zhang's "atom" 
[10], which is an element in the dictionary, a finite set of vectors. In both 
cases, however, an atom denotes an irreducible quantity. 



in decreasing order. For each k, there exists pk G C such 
that \pk\ = 1 and puUkV^ G O. Then an orthonormal set 
atoms(X) is given by 

atoms(X) = {pkUkVkYk=i (X) - 

Remark 2.2: atoms(X) and rank(X) = |atoms(X)| of 
a matrix X G C mx " are the counterparts of supp(.x) and 
||x|| = |supp(x)| for a vector x G C p , respectively. 

C. Generalized Correlation Maximization 

Recht, Fazel, and Parrilo [1] showed an analogy between 
rank minimization PI and ^o- norm minimization. We consider 
instead the rank-r matrix approximation problem P2 and its 
analogue - the s-term vector approximation problem 

min II At — foil 

P3: zee; 112 

subject to ||x|| $J s. 

In Problem P3, variable x lives in the union of s dimensional 
subspaces of C", each spanned by s elements in the finite 
set E = { e e„}, the standard basis of C". Thus the 
union contains all s-sparse vectors in C™. Importantly, finitely 
many (( n ), to be precise) subspaces participate in the union. 
Therefore, it is not surprising that P3 can be solved exactly by 
exhaustive enumeration, and finite selection algorithms such as 
CoSaMP are applicable. 

In the rank-?' matrix approximation problem P2, the matrix 
variable X lives in the union of subspaces of C mxn , each 
of which is spanned by r atoms in the set O. Indeed, if 
X G <C rnxn is spanned by r atoms in O, then rank(X) ^ r 
by the subadditivity of the rank. Conversely, if rank(X) = r, 
then X is a linear combination of rank-one matrices and 
hence there exist r atoms that span X. Note that uncountably 
infinitely many subspaces participate in the union. Therefore, 
some selection rules in the greedy algorithms for £o-norm min- 
imization and s-term vector approximation do not generalize in 
a straightforward way. None the less, using our formulation of 
the rank-r matrix approximation problem in terms of an atomic 
decomposition, we extend the analogy between the vector and 
matrix cases, and propose a way to generalize these selection 
rules to the rank-r matrix approximation problems. 

First, consider the correlation maximization in greedy algo- 
rithms for the vector case. Matching Pursuit (MP) [10] and 
Orthogonal Matching Pursuit (OMP) [11] choose the index 
k G {1, . . . , 71} that maximizes the correlation \a^{b — Ax)\ 
between the fc-th column of A and the residual in each 
iteration, where x is the solution of the previous iteration. 
Given a set "J, let Pq, denote the projection operator onto 
the subspace spanned by ^ in the corresponding embedding 
space. When ^> = {tp} is a singleton set, will denote 
For example, P &k denotes the projection operator onto 
the subspace in C" spanned by e^. From 

\a%(b-M)\ = \(A H (b~ Ax),e k ) C n\ = \\P Ck A H (b - Ax)\\ 

it follows that maximizing the correlation implies maximizing 
the norm of the projection of the image under A H of the 



residual b — Ax onto the selected one dimensional subspace. 

The following selection rule generalizes the correlation 
maximization to the matrix case. We maximize the norm of 
the projection over all one-dimensional subspaces spanned by 
an atom in O: 



Algorithm 1 ADMiRA 



max (b — AX, Atb)pn 
V-gO x 



max Pj,A*(b-AX) 

i/>eo ^ f 



(3) 

where A* : C p — > C mx ™ denotes the adjoint operator of A. 
By the Eckart- Young Theorem, the basis of the best subspace 
is obtained from the singular value decomposition of M = 
A*(b— AX), as xp = u\Vi , where u\ and v\ are the principal 
left and right singular vectors. 

Remark 2.3: Applying the selection rule © to update X 
recursively leads to greedy algorithms generalizing MP and 
OMP to rank minimization. 

Next, consider the rule in recent algorithms such as 
CoSaMP, and SP. The selection rule chooses the subset J of 
{1, . . . , n} with | J| = s defined by 

\af{b- Ax)\>\af{b- Ax)\, VfceJ,Vj^J. (4) 

This is equivalent to maximizing 

J2\^(b-A£)\ 2 = \\P {ek}keJ A H (b-Ax)\\l. 
fee J 

In other words, selection rule © finds the best subspace 
spanned by s elements in E that maximizes the norm of 
the projection of M = A H (b — Ax) onto that s-dimensional 
subspace. 

The following selection rule generalizes the selection rule 
(|4]i to the matrix case. We maximize the norm of the projection 
over all subspaces spanned by a subset with at most r atoms 
in O: 



{ P^A*{b-AX) : |*| < r} 



max • 

A basis * of the best subspace is again obtained from the 
singular value decomposition of M = A*(b — AX), as 
* = {PkUkv"} r k=1 , where u k and w fc , k = 1, . . . , r are the 
r principal left and right singular vectors, respectively and 
for each k, pk G C satisfies \pk\ = 10. Note that * is an 
orthonormal set although this is not enforced as an explicit 
constraint in the maximization. 

III. Algorithm 

Algorithm Q] describes ADMiRA. Steps @] and |7j involve 
finding a best rank-2r or rank-r approximation to given 
matrix (e.g., by truncating the SVD), while Step [6] involves 
the solution of a linear least-squares problem - all standard 
numerical linear algebra problems. Step merges two given 
sets of atoms in O by taking their union. 

Most steps of ADMiRA are similar to those of CoSaMP 
except Step|4]and Step [7] The feasible sets of the maximization 
problems in Step |4] and Step [7] of ADMiRA are infinite while 
those in the analogous steps of CoSaMP are finite. In CoSaMP, 

3 Once the best subspace is determined, it is not required to compute the 
constants pfc's. 



Input: A : C mxn -> C p , b € C p , and target rank r G N 
Output: rank-r solution X to P2 



X <- 
* <- 

while stop criterion is false do 

<- argmax(\\p^A*(b - AX) 
*co I II 

\&' u $ 

argmin AX\\ 2 : X e span(*)| 



X 



x I 

$ <— are' max <^ 



P^X 



X 

end while 
return X 



a greedy selection is employed to solve the combinatorial 
problem and provides the exact solution owing to the orthog- 
onality of the feasible set. The maximization problem over the 
infinite set in ADMiRA may look even more difficult than the 
combinatorial problem in CoSaMP. However, singular value 
decomposition can solve the maximization problem over the 
infinite set efficiently. 

IV. Main Results: Performance Guarantee 

A. Rank-Restricted Isometry Property (R-RIP) 

Recht et al [1] generalized the sparsity -restricted isometry 
property (RIP) defined for sparse vectors to low rank matrices. 
□ In order to draw the analogy with known results in £q- 
norm minimization, we slightly modify their definition by 
squaring the norm in the inequality. Given a linear operator 
A : C mxn — ► C p , the rank-restricted isometry constant S r (A) 
is defined as the minimum constant that satisfies 

(1 - 5 r (A)) \\X\\ 2 F ^ hAX\\l < (1 + S r (A)) \\X\\ 2 F , (5) 

for all X € C mx " with rank(X) ^ r for some constant 7 > 
0. Throughout this paper, we assume that the linear operator 
A is scaled appropriately so that 7 = 1 in (O ■ 

B. Performance Guarantee 

Subject to the R-RIP, the Atomic Decomposition for Min- 
imum Rank Approximation Algorithm (ADMiRA) has a per- 
formance guarantee analogous to that of CoSaMP. 

The followings are the assumptions in ADMiRA: 



Al 

A2 
A3 



The target rank is fixed as r. 

The linear operator A satisfies Sr r (A) ^ 0.043. 

The measurement is obtained by 

b = AX + v, 



(6) 



4 They also demonstrated "nearly isometric families" satisfying this R- 
RIP (with overwhelming probability). These include random linear operators 
generated from i.i.d. Gaussian or i.i.d. symmetric Bernoulli distributions. 

5 If 7 ^ 1, then only the constant for the noise gain will be scaled 
accordingly. 



where v is the discrepancy between the measurement 
and the linear model AX. 

Assumption A2 plays a key role in deriving the performance 
guarantee of ADMiRA. This enforces the rank-restricted isom- 
etry property of the linear operator A. Although the verifica- 
tion of the satisfiability of A2 is as difficult as or more difficult 
than the recovery problem itself, nearly isometric families that 
satisfy the condition in A2 have been demonstrated [1]. 

The performance guarantees are specified in terms of a mea- 
sure of inherent approximation error, termed unrecoverable 
energy defined by 

e=\\X-X r \\ F + ±=\\X-X r \\. + \\v\\ 2 , 
v r 

where X r denotes the best rank-r approximation of X. The 
first two terms in e define a metric of the minimum distance 
between the "true" matrix X and a rank-r matrix. This is 
analogous to the notion of a measure of compressibility of 
a vector in sparse vector approximation. No solution of P2 
can come any closer to X. The third term is the norm of the 
measurement noise, which must also limit the accuracy of the 
approximation provided by a solution to P2. 

Theorem 4.1: Let X k denote the estimate of X in the k- 
th iteration of ADMiRA. For each k ^ 0, X k satisfies the 
following recursion: 

\\X-X k+1 \\ F 0.5||X-X fc || F + 10e, 

where e is the unrecoverable energy. From the above relation, 
it follows that 

\\X-X k \\ F ^ 2- fe ||X|| F + 20e, Vfc^O. 

Theorem 14.11 shows the geometric convergence of ADMiRA. 
In fact, convergence in a finite number of steps can be achieved 
as stated by the following theorem. 

Theorem 4.2: After at most 6(r + 1) iterations, ADMiRA 
provides a rank-r approximation X of X, which satisfies 

\\X-X\\ F ^ 20e, 

where e is the unrecoverable energy. 

Depending on the spectral properties of the matrix X, even 
faster convergence is possible. 

C. Relationship between PI, P2, and ADMiRA 

The approximation X given by ADMiRA is a solution to 
P2. When there is no noise in the measurement, i.e., b = AX, 
where X is the solution to PI, Theorem 14. 1 1 states that if the 
ADMiRA assumptions are satisfied with r ^ rank(JT), then 
X = X. An appropriate value can be assigned to r by an 
incremental search over r. 

For the noisy measurement case, the linear constraint in PI 
is replaced by a quadratic constraint and the rank minimization 
problem is written as: 

min rank(X) 

Pl' : XGC™X" 

subject to \\AX — 6|| 9 ^ rj. 



Let X' denote a minimizer to PI'. In this case, the approxi- 
mation X produced by ADMiRA is not necessarily equivalent 
to X' , but by Theorem 14.11 the distance between the two is 
bounded by \\X' - X\\ F 20?? for all r ^ rank(X') that 
satisfies the ADMiRA assumptions. 

V. Properties of the Rank-Restricted Isometry 

We introduce a number of properties of the rank-restricted 
isometry. These properties serve as key tools for proving the 
performance guarantees for ADMiRA in this paper. These 
properties further extend the analogy between the sparse vector 
and the low-rank matrix approximation problems (P3 and 
P2, respectively), and are therefore also of interest in their 
own right. An operator satisfying the R-RIP satisfies, as a 
consequence, a number of other properties when composed 
with other linear operators defined by the atomic decompo- 
sition. Most properties are inherited from the vector case. 
However, the generalization of the restricted orthogonality 
property to the matrix case is not straightforward and shows 
some nontrivial differences. The following Proposition is an 
extension of Lemma 2.1 in [12] for the vector case to the 
matrix case. 

Proposition 5.1: Suppose that linear operator A : C mx " — > 
C p has the rank-restricted isometry constant 5 r (A). Let 
X,Y € C mxn such that (X,Y) Cm *. n = and rank(X) + 
rank(y) ^ r. Then 

\(AX,AY)cp \ ^ V2S r (A)\\X\\ F \\Y\\ F . (7) 

Remark 5.2: For the real matrix case, Proposition 15.11 can 
be improved by dropping the constant s/2. This improvement 
is achieved by replacing the parallelogram identity in the proof 
to the version for the real scalar field case. 

Remark 5.3: For the vector case, the representation of a 
vector x € C" in terms of the standard basis {ej}™ =1 of C™ 
determines \\x\\ . Let J\, J2 C {1, . . . ,n} be arbitrary. Then 
the projection operators P{ ej } jeJl an d P{ej}j e ,^ commute. 
Furthermore, P^ e .y eJ x is s-sparse (or sparser) if x is ,s- 
sparse. These properties follow from the orthogonality of 
the standard basis. Proposition 3.2 in [7], corresponding in 
the vector case to our Proposition 15.11 requires these two 
properties. However, these properties do not hold for the 
matrix case. For "Ji,^ C O, the projection operators Pq fl 
and P-q/ 2 do not commute in general and rank(P^X) can be 
greater than r even though rank(X) ^ r. Proposition 15.11 is 
a stronger version of the corresponding proposition for the 
vector case in the sense that it requires a weaker condition 
(orthogonality between two low-rank matrices), which can be 
satisfied without these properties. 

VI. Implementation and Scalability 

Most of the computation in ADMiRA lies in the truncated 
singular value decomposition. The fact that ADMiRA keeps 
the matrix variables in their atomic decomposition is advanta- 
geous for this procedure. Only a few dominant singular triplets 
are necessary, which can be computed by the Lanczos method 
in 0((m + n)rL) time, where L is the number of the iterations 



that depends on the singular value distribution.. An alternative 
approach is to use a randomized algorithm [14] that computes 
the low-rank approximation of a given matrix in atomic 
decomposed form in 0((m + n)r 3 logr) time. In this case, 
ADMiRA has complexity of 0((m + n)r 3 logr) per iteration, 
or 0((m+n)r 4 logr) to achieve the guarantee in Theorem l4.2l 
and scales well to large problems. Another consideration is the 
computation of the proxy matrix. This involves applying A 
and A*, the complexity of which is 0(rpmn). If A consists 
of sparse matrices, then the complexity can be as small as 
0(rp(m+n)). In particular, in the matrix completion problem, 
AX is sampling the entries of matrix X and hence there is 
no multiplication in this procedure. 

VII. Numerical Experiment 

We study reconstructions by ADMiRA with a generic ma- 
trix completion example. Our preliminary Matlab implemen- 
tation uses ARPACK [13] to compute partial SVDs in Steps g] 
and E] of ADMiRA. The test matrix X G W ixn is generated 
as the product X = YiXr where Yl,Yr e R nxr has entries 
following an i.i.d. Gaussian distribution. The measurement b is 
p randomly chosen entries of X, which may be contaminated 
with an additive white Gaussian noise. The reconstruction 
error and measurement noise level are measured in terms of 
SNR rccon 4 201og 10 (||X||^/||X-X|| F ) and SNR mcas 4 
201og 10 (||6|| 2 / IMI2)' respectively. Computational efficiency 
is measured by the number of iterations. The results in Fig. [T] 
and Table U have been averaged over 20 trials. 

Fig. [T] shows that both SNR rocon and the number of iter- 
ations improve as p/d r increases. Here d r is the number of 
degrees of freedom defined by d r = r(n + m — r) and denotes 
the essential number of unknowns. Fig. [T] suggests that we 
need p/d r 20 for n = 500. 

Table Q] shows that ADMiRA provides slightly better per- 
formance with less computation than SVT [4]. Roughly, the 
computational complexity of a single iteration of ADMiRA 
can be compared to three times of that of SVT. 

Fig. H] compares the phase transitions of ADMiRA and 
SVT. We count the number of successful matrix completion 
(SNR rcC on 70dB) out of 10 trials for each triplet (n,p,r). 
The brighter color implies more success. ADMiRA performed 
better than SVT for this example. 




10 20 30 40 50 10 20 30 40 50 

p/d r p/d r 

Fig. 1. Matrix completion by ADMiRA: n = 500, r = 2. 



We note though, that the performance guarantee in the previ- 
ous sections is not directly applicable to the experiments in this 
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Fig. 2. Phase transition of matrix completion: n = m = 100. 
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p/d r 


SNR, rccoil 


(dB) 


#iter 


r 


ADMiRA 


SVT 


ADMiRA 


SVT 


2 


0.20 


50.05 


82 


79 


11 


54 


5 


0.20 


20.05 


81 


78 


15 


64 


10 


0.20 


10.05 


79 


77 


19 


80 



TABLE I 



Comparison of ADMiRA and SVT: no noise, n = m = 1000. 

section since the linear operator in the matrix completion does 
not satisfy the R-RIR It seems that a performance guarantee 
without using the R-RIP might be possible. 

VIII. Conclusion 

We propose a new algorithm, ADMiRA, which extends both 
the efficiency and the performance guarantee of the CoSaMP 
algorithm for £rr norm minimization to matrix rank minimiza- 
tion. The proposed generalized correlation maximization can 
be also applied to MP, OMP, and SP to similarly extend 
the known algorithms and theory from the s-term vector 
approximation problem to the rank-r matrix approximation. 
ADMiRA can handle large scale rank minimization problems 
efficiently by using recent linear time algorithms for low rank 
approximation. More detailed arguments and missing proofs 
are available in [15]. 
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